MLCommons, a nonprofit AI safety working group, has teamed up with AI dev platform Hugging Face to release one of the world’s largest collections of public domain voice recordings for AI research. The data set, called Unsupervised People’s Speech, contains more than a million hours of audio spanning at least 89 different languages. MLCommons says […] © 2024 TechCrunch. All rights reserved. For personal use only.
This TechCrunch article describes the release of a massive open-source speech dataset called Unsupervised People's Speech by MLCommons and Hugging Face.
The core idea is:
* Expanding AI research possibilities: The dataset, containing over a million hours of audio across 89 languages, aims to fuel innovation in speech technology, especially for under-resourced languages.
However, the article also highlights ethical concerns:
* Data bias: The dataset's majority English-language content might perpetuate existing biases in AI systems trained on it.
* Lack of consent: The origins of the recordings (Archive.org) raise questions about whether all individuals whose voices are included consented to their use in AI research.
Ultimately, the article presents a complex picture: exciting potential for AI advancement alongside crucial considerations regarding data ethics and fairness.
This TechCrunch article describes the release of a massive open-source speech dataset called Unsupervised People's Speech by MLCommons and Hugging Face. The core idea is: * Expanding AI research possibilities: The dataset, containing over a million hours of audio across 89 languages, aims to fuel innovation in speech technology, especially for under-resourced languages. However, the article also highlights ethical concerns: * Data bias: The dataset's majority English-language content might perpetuate existing biases in AI systems trained on it. * Lack of consent: The origins of the recordings (Archive.org) raise questions about whether all individuals whose voices are included consented to their use in AI research. Ultimately, the article presents a complex picture: exciting potential for AI advancement alongside crucial considerations regarding data ethics and fairness.